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Change  last  year. 

Change  the  year  before! 

Expect  Change  this  year. 

Unlike  any  change  of  yore? 

To  detect  change,  without  fear, 

CUSUM  process  your  score  .  .  . 

Unity  makes  practice  of  statistics  clear. 

Who  could  ask  for  anytliing  more? 

0.  GOALS 

Ultimate  goals  of  our  research  program  include:  unify  parametric  and  nonparamet¬ 
ric  inference  for  continuous  and  discrete  data;  synthesize  classical  statistical  methods  and 
changepoint  hypothesis  testing;  demonstrate  that  mathematical  statistical  and  data  an¬ 
alytic  approaches  are  both  needed  for  statisticEil  inference;  stimulate  exoteric  methods 
(applicable  by  applied  reseeirchers)  rather  thzm  esoteric  methods  (known  only  to  a  small 
group  of  mathematical  statisticians);  combine  mathematical  statistical  and  data  analytic 
views  to  develop  methods  of  statistical  analysis  which  are  based  on  assumptions  (known 
model)  whicli  are  tested  in  ways  that  provide  in.sight  how  to  model  deviations  of  the  data 
from  the  assumed  model  (and  thus  often  identify  a  “true”  model  as  an  “iterated”  model 
which  models  “residuals”);  contribute  to  solutions  of  the  historical  basic  applied  problem 
of  statistics:  distinguish  diange  (of  the  model)  from  fluctuation  (within  the  model),  the 
variability  expected  under  homogeneity. 

This  paper  is  not  a  flnished  or  rigorous  presentation  of  results;  it  is  a  stimulus  for 
discussion  about  open  research  problems  in  change  analysis.  One  need  may  be  to  determine 
how  to  develop  a  classification  scheme  to  catalog  the  past  and  future  extensive  literature 
about  statistical  methods  to  model  change. 
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1.  COMPARISON  CHANGE  ANALYSIS  AS  PROBABILITY  STUDY  OF 

This  section  outlines  the  notation  and  concepts  that  we  introduce  (Paxzen  (1992)) 
in  ovu:  probability  theory  of  the  relations  between  two  random  variables  X  and  Y.  They 
motivate  the  statistics  that  we  propose  to  describe  the  changes  over  time  of  a  series  of 
observations  Y{t),  <  =  1,2, ....  To  apply  the  probability  theory  of  {X,Y)  to  data,  let  X 
represent  t,  the  index  of  observation. 

The  distribution  fimction,  quantile  function,  probability  mass  function,  and  proba¬ 
bility  density  functions  of  Y  are  respectively  denoted  Fy(y),  Qy{u),  py(y),  /y(y)-  We 
assume  that  Y  is  either  discrete  or  continuous,  X  is  either  discrete  or  continuous. 

To  develop  a  theory  that  applies  to  both  discrete  and  continuous  variables  we  define 
r,  0  <  r  <  1,  to  be  an  jf-exact  value  if 


FxiQxi'^))  = 

If  Fx(-)  is  continuous,  all  r  are  -exact.  If  Fxi‘)  is  discrete,  r  is  X-ex&ct  if  there  exists 
value  X  at  which  Fx  jumps  and  x  =  Qxi^)  (therefore  Fx{x)  =  r). 

Let  U  denote  a  random  vziriable  wliich  is  Uniform[0,l].  If  Y  is  continuous,  the  proba¬ 
bility  integral  transform  Fy  (F)  is  identically  distributed  asU.  If  F  is  discrete  we  transform 
F  to 

=  Fy(Y)  -  .5pv(Y). 

If  u  is  F-exact, 

Prob[F^‘‘^(F)  <  u]  =  «  =  Prob[F  <  gy(u)]. 

A  function  J(u),0  <  u  <  1,  is  called  a  score  function  (to  be  more  precise,  F-score 
function);  it  is  called  normalized  if 

f  J(u)  =  0,  /  J^{u)du  =  1. 

Jo  Jo 

Score  change  density  and  score  change  process:  Define  for  0  <  r  <  1 
c(t,  J)  =  E[J(Ff>"(Y))lY  =  (?x(r)l)  -  £1J{P)1, 

C(t,J)=  [  c{t,J)dt. 

Jo 

For  a  sequence  Y{t),t  =  1, . . .  ,n,  analogous  concepts  are,  defining 


Y-  =  (l/n)]3r(<),  r={Un)'£f{Y{t)), 


UBC  QUALiry  inspected 


t=l  t=l 

the  sample  change  density 

c~(r)  =  F(t)-  F",(j  -  l)/n  <r  <j/n,j  =  l,...,n. 
and  the  sample  change  process 


Aeoasslon  For 
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Patterns  in  these  change  processes  will  be  examined  by  computing  linear  functionals 
for  suitable  change-score  functions  K(t),0  <  r  <  1.  Define 

[d(.,  J),K]  =  [/,  /<]=/'  c(r,  J)K(T)dT 

JO 

We  call  [J,  K]  a  double  score  component.  It  measures  how  c(r,  J)  behaves  as  a  fimction  of 
r  (for  J  fixed). 

Change  Theorem  C:  C{t^J)  linearly  interpolates  its  values  at  X-exact  values  of  r, 
where  it  satisfies 


C(r,  J)  =  r(,B[J{Ff‘(Y))\X  <  Qa-Ct)!!  -  £1J(:^)1). 

The  proof  of  Change  Theorem  C  requires  the  methodology  (Parzen  (1979),  (1991), 
(1992),  (1993))  of  comparison  density  functions  d(u;  F,  G)  and  comparison  distributions 
D{u;  F,  G);  they  compare  two  distributions  F  and  G  which  are  either  discrete  or  continuous. 
D(ul  is  defined  as  the  integral  of  d{u),  d{u)  =  D'(u).  When  d{u)  is  piecewise  constant, 
D{u)  is  piecewise  linear.  When  F  and  G  are  both  continuous  we  define  D{u-,F,G)  = 
G(F-i{u)). 

Change  Dependence  Densities  and  Distributions:  define,  for  0  <  r,  it  <  1 


d{T,u)  =  d(u;FY,FYix=QxiT))> 
<^(lOi'r],“)  =  ^{^iFY,FY\x<QxiT))- 


D(r,u)  =  D{u',Fy,Fy\x=Qx(t))^ 

F>i[0,  r],  u)  =  D(u:  Fy,  -^yiX^gxCi-))- 
Best  Change  Theorem  D:  For  r  X-exact  and  u  Y -exact 

r(i([0,  r],u)  =  f  d{t,u)dt 

Jo 

We  call  this  theorem  best  because  it  explains  why  estimators  of  rd([0,  r),u)  for  fixed  r 
have  asymptotic  vzuieinces  similar  to  that  of  probabilities  rather  than  densities,  and  it 
yields  proofs  of  all  change  theorems  stated.  The  proof  of  Change  Theorem  D  is  outlined 
in  Parzen  (1992). 

Change  Theorem  E:  c(r,  J)  =  Jq  J{u){d(T,u)  —  l)du 

C(r,  J)  =  r  f  J(u)((i([0,  r],«)  —  l)du 

Jo 

[J,K]=  [  (  K{T)J{u){d{T,u)-l)dudT 

Jo  Jo 

Important  score  functions  are  indicator  score  functions  J(.;u):  J(u'',u)  =  1  or  0  as 
u'  <  u  or  u'  >  u.  Assume  u  is  F-exact.  Denote  by  c(r,  u)  and  C(t,u)  the  change  density 
and  change  process  of 

c(r,u)  =  Pro6(ffW(r)  <  u|X  =  Qx(t)I1  -  u, 

C(t,„)  =  T(ProblF^‘<‘(y}  <  a|X  <  (J,V(’')11  -  ")■ 
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Change  Theorem  F:  At  A’-exact  r  eind  V-exact  u 

C(r,  u)  =  t(D(10,  t],  u)  -  u) 

C(t,u)  =  D(t,u)  —  TU 

Another  important  score  function  is  J{u)  =  Qyiu).  Its  change  density  and  change 
process  correspond  to  conditional  means  of  Y : 

c(t,Qy)  =  E[Y\X  =  Qx(r)\  -  E[Y], 

C(r,  Qy)  =  r(E[Y\X  <  Qx(r)]  -  ElY]) 

Measures  of  dependence  czm  be  defined  in  terms  of 

/’  Hr,QY)\‘^dT  =  VAR(£lr|X]), 

Jo 

I  C(T,QY)dT  =  f  -(s  -  .5)c(s,Qy)ds 
JQ  Jo 

When  X  and  Y  are  jointly  normal  with  correlation  Coefficient  p, 

E[Y  -  E[Y]\X\  =  (,j[Y]lc[X])p{X  -  BM) 

Therefore  the  change  density  of  Y  given  X,  when  {X,  Y)  is  bivariate  normal,  is 

c{t,Qy)  =  (T[r]p#"^(r). 

Its  integral  is  C{t,Qy)  whose  graph  has  the  typical  shape  of  a  change  process  which  is 
able  to  detect  whether  there  is  a  change  in  F  as  a  function  of  X. 

To  test  the  independence  of  X  and  Y  one  examines  change  processes  c(r,  (jr(Qy))  for 
several  transformations  g,  which  correspond  to  conditional  means  of  non-lineeir  functions 
of  Y.  The  problem  in  practice  is  how  to  choose  informative  non-linear  functions. 

If  one  assumes  a  pau’ametric  model  /^(y)  for  the  true  density  /y(y),  where  0  is  a  vector 
parameter  with  components  6j,  one  choose  non-linear  functions  of  Y  equal  to  Fisher-score 
functions,  defined  by 

Sj{y,0)  =  {d/d9)logf0{y). 

Fisher-score  change  densities  are  defined  to  be  c(t,  Sj(QY,9)). 

They  are  called  pareimetric  change  densities  in  contrast  with  c(r,  J)  which  are  non- 
peu’ametric  change  densities. 

2.  ASYMPTOTIC  DISTRIBUTIONS  OF  SAMPLE  CHANGE  PROCESSES 
Empirical  process  theory  studies  limit  theorems  for 

C'(/)  =  iE(/{rW)-.E[/(rW)l) 

f=l 

uniform  in  /  belonging  to  a  specified  family  of  functions  /(y). 

Empirical  change  processes  theory  studies  limit  theorems  for 

,  («r] 

l=l 
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where  f~  is  the  sample  mean  of  f(Y{t)). 

Sample  versions  of  change  processes  C(r,  J)  and  C(t,  u)  computed  from  a  sample  (of 
size  n)  are  denoted  C~(r,  J)  and  C~(r,  u);  using  theorems  in  the  literature  (especially  Csorgo 
and  Horvath  (1987))  one  can  show  that  they  have  large  sample  asymptotic  distributions 
under  the  hypothesis  of  no  change  (where  B{t)  is  a  Brownian  Bridge  and  B{t,u)  is  a 
Brownian  sheet) 

n-5c~(r,Q~y),  0  <  r  <  1  converges  to  B(t),  0  <  r  <  1. 

n-  5C-(r,.7),0<r<l  converges  to  B{t),0  <  t  <  1,  assuming  J  normalized, 

n'  5C-(r,u),0<T,u<l  converges  to  B(t,u),0  <  t,  u  <  1; 

for  fixed  t,  (n/r(l  —  r))-^C~(r,  u),0  <  u  <  1  converges  to  B(u),Q  <  u  <  1. 

Parzen  and  Horvath  (research  in  progress)  establish  similar  asymptotic  theorems  for 
Fisher-score  change  processes 

n-^C-(T,Sj(QY,e)) 

where  ^  is  a  maximum  likelihood  estimator. 

For  comparison  distributions,  the  asymptotic  distributions  imder  the  null  hypothesis 
of  no  change  are 

n‘®(D~(r,  u)  —  Tu)  converges  to  B(t,  u) 
n‘^T(D'([0,  r],u)  —  u)  converges  to  B{t^u) 

The  Pyke-Shorack  two  sample  process  can  be  expressed:  for  fixed  r,  as  n  tends  to  oo, 
(nr /(I  —  T))'^(£)~([0,r|,u)  —  u)  converges  to  B(u). 

The  sample  distribution  fimction  F~  of  a,  sample  (of  size  ni)  from  true  distribution 
F  can  be  studied  as  the  limit  of  two  samples  as  r  — »  0,  first  sample  size  ni  =  nr  oo; 
empirical  processes  can  be  expressed 

nf{D~{u]F,F~)  —  u)  converges  to  B{u). 

The  foregoing  are  asymptotic  distributions  of  sample  change  processes  under  the  null 
hypothesis  of  no  change.  Of  great  interest  are  their  asymptotic  distributions  under  alteT- 
native  hypotheses  of  change. 

The  sample  compeirison  function  T>(u;  G,  F~)  of  the  sample  distribution  function  F~ 
with  a  model  G,  when  the  sample  of  size  nj  has  true  distribution  function  F,  has  asymptotic 
distribution 

n-^  (D  (u;  G,  n  -  D  (u;  G,  F))  Bp  {D  (u;  G,  F)) 

where 

Bf{u)=nf{D{u-,FF')-u) 

is  called  the  empirical  process  of  the  sample  and  is  approximately  a  Brownian  Bridge. 
Under  suitable  conditions 

-  (^0-'KG,F))(-l)Bf(x) 

The  comparison  of  the  sample  up  to  time  r  (of  size  nr)  with  the  whole  sample  (of  size 
n),  under  the  changepoint  assumption  that  the  sample  up  to  time  r  and  the  sample  after 
time  r  have  respective  tnie  distributions  F([0,  r],?/)  and  F([r,  l],y)  and  pooled  sample 
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has  distribution  jPy(y)  =  ■F’([0,  l],y),  has  asymptotic  distribution  for  fixed  r  suggested  by 
Pyke-Shorack  theory  for  two  samples;  as  processes  on  0  <  u  <  1, 

n  ®T  {D~{[0,  r], u)  -  D((0,  r],  u)) 

rd([0,  r],u)B[^  l]((l  -  t)D([t,  1],u)) 

-  (1  -  Td([0,  r], u)B|o,r](T-D([0,  r],  u)) 

where  and  2](“)  the  empirical  processes  of  the  samples  before  and  after  r 

respectively.  Note  that  B{tD)  symbolizes  t'^B{D),  and 

tB([0,  r],  u)  +  (1  -  r)B([r,  1],  u)  =  u. 

From  Ruymganrt  (1974)  we  obtain  results  when  {X,Y)  has  a  continuous  bivariate 
distribution: 


n 


.5 


K(T)J(u)dD~(T,u)  -  fjl  K{T)J{u)dD{T,u)^ 

—*  Normal  (0)  [  f  \V{T,u)\‘^dD{T,u)) 

Jo  Jo 


defining 

V{t,u)  =  K{t)J{u)-  [  f  K{t)J{s)dD{t,s) 

Jo  Jo 

+  [  [  I<it)[e{s  -  u)  -  s]j'{s)dD{t,s) 

Jo  Jo 

+  f  f  K'{t)[e{t-T)-t]J{s)dD{t,s) 

Jo  Jo 

where  e(i)  =  lor0asx>0orx<0.  Under  the  null  hypothesis  that  X  and  Y  are 
independent,  D{t,u)  =  ru, 


I 


and 


V{t,u)  =  {K{t)-  K{t)dt){J{u)-  rj{s)ds), 

Jo  Jo 


n'^(D~(T,u)  —  D{t,u))  —*  B{r,u). 


Note  that  (Weiss  (1964)) 

Cov  [n-^fxQx(r)  {Q'x(r)  -  Qx(r)) ,  n'^fYQrW  (Q-yM  -  QyM}] 
is  aymptotically  D(t,u)  —  tu. 


3.  ONE  WAY  ANALYSIS  OF  VARIANCE  (AOV) 

Change  analysis  provides  new  graphical  data  analysis  interpretations  of  classic2il  statis¬ 
tical  methods.  The  one  way  analysis  of  variance  (AOV)  tests  the  equality  of  distributiions 
of  variables  (or  populations)  Yi,. . .  ,Yc  imder  the  assumption  that  they  are  independent 
and  their  distributions  satisfy 

Yj  is  Nonnal(/ij, cr  ),y  =  l,...,c. 
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Note  that  if  one  has  observed  values  Y (t),  t  =  1, . . . ,  n,  of  a  variable  Y,  the  variables 
Yj  could  represent  the  values  of  y'(t)  for  the  j-th  time  segment  7j_i  <  t  <  Tj^  where 
0  =  To<Ti<...  <  Tc  =  n  are  specified  by  the  statistician. 

The  parameters  to  be  estimated  axe  fi\, . . . ,  Uccr.  The  basic  hypothesis  to  be  tested 
is  the  hypothesis  of  homogeneity 

Hq  :  fii  -  ...  =  nc  =  n. 

For  j  =  1, . . . ,  c,  one  observes  nj  values  of  Yj  denoted  Yji,. . . ,  Yjn-  with  sample  mean 

i=l 


and  sample  variance 

Sj  =  (l/ni)E(lS'i  - 

i=l 

The  pooled  sample  of  all  the  data  has  size  n  =  nj  +  . . .  +  Uc.  The  proportion  of  the 
pooled  sample  from  the  j-th  sample  is 

Pj  = 

the  cumulative  proportions  are  denoted 

Tj  =pi  +  ...+pj. 

We  introduce  a  variable  X  to  represent  the  population  j  =  1, . . .  ,c  from  which  an 
observation  yji  is  made.  An  observation  is  {X,  K).  The  sample  probability  and  distribution 
of  X  is 

Px'U)^Pj^Fx~{j)  =  Tj. 

The  variable  X  is  not  a  random  variable,  but  we  condition  Y  hy  X  using  sample 
(empiriczd)  probabilities  rather  than  population  (ensemble)  probabilities.  We  find  it  an  aid 
to  understanding  to  use  an  alternate  notation  for  Y^j  and  Sj  as  the  sample  conditional 
mean  and  veiriance  of  Y  given  X  —  y. 


Er\Y\X^3\  =  Y-j.. 

VAR'[r|A"  =  j]  =  s]. 

The  pooled  sample  has  sample  mean  F"..  and  sample  variance  5y  which  can  be  inter- 
preted  as  unconditional  mean  and  veiriance  of  Y: 

c 

=  y-..  = 

j=l 

C 

VAR-[Y]  =  sf.  =  (1/,.)  E  -  y'-f 

j=l i=l 
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The  theory  of  conditional  expectation  has  important  formulas 

VAR[y]  =  VAR[E[y  IX]]  +  £;[VAR[y  lA-]]. 

VAR‘[y]  =  VAR-lViyiX]]  +  ElVAR-iyiX]]. 

Anedysis  of  vaxiance  tests  Hq  by  comparing  various  estimators  of  variance.  The  mean 
conditional  varizmce  (denoted  and  the  variance  of  the  conditional  mean  (denoted 

^mean)  are  defined  by 


5?ar  =  f;'[VAR-[y|j^r]]  =  ^pjsf, 

i=i 


SLan  =  yAR'[E~[Y\X]]  =  -  y- .)2 

i=l 

The  traditionad  F  test  statistic  (denoted  FT)  for  testing  Hq  can  be  represented 

FT  =  ((n  -  c)/(c  -  1))F, 

^  ~  ^mean/^var 

An  estimator  of  in  the  AOV  model  is 

S'^  =  (n/(n  -  c))5Jar; 


it  is  unbiased  since  £?[5^]  =  <j^.  The  numerator  of  the  F  statistic  can  be  shown  to  have 
mean 

c 

■^[‘^meanl  =  (c  —  1)(T  +  V  ^  PjiPj  ~  P)  > 

j=l 

defining 

c 

p  =  J2pjPj- 
i=i 

The  numerator  and  denominator  of  F  can  be  shown  to  be  independent  random  variables; 
therefore 

c 

E[Fr|  =  1  +  (c  -  1)-'  -  p)l<7? 

J=1 

This  formula  for  the  mecin  of  FT  is  used  to  justify  why  we  should  reject  the  hypothesis  Hq 
of  equal  means  when  FT  is  too  large;  FT  >  2  is  a  reasonable  general  criterion  for  rejecting 
Hq.  Akaike  (1985)  describes  the  emergence  of  the  magic  number  2.  The  critical  value 
of  FT  is  exactly  determined  from  the  fact  it  obeys  an  F  distribution  with  (c  —  l,n  —  c) 
degrees  of  freedom. 

Data  an2dysis  by  analysis  of  variance  is  usually  presented  as  an  AOV  table. 

4.  CHANGE  ANALYSIS  APPROACH  TO  AOV 

The  change  analysis  approacli  to  AOV  provides  graphical  analysis  of  the  standardized 
data 

y  =  (r  -  £-(ri)/5y. 
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r 


by  forming  processes  on  the  unit  interval  0  <  r  <  1  defined  as  follows; 
change  density:  c"(t)  =  Y*~j  =  {Y~j  —  Y~,,)/SY,Tj^i  <  r  <  tj; 

change  process:  C‘'(t)  =  Jq  c~(s)ds; 

change  test  process:  CT~{t)  =  —  r))-^; 

change  test  density:  cT~{t)  =  c~{T){pj/{l  —  <  r  <  tj. 

The  chzmge  process  is  linear  between  its  values  at  r  =  tj: 

j 

-  Y-.MSy  =  TjEr[Y*\X  <  j]. 

i=l 

The  process  n'^C~(r),  0  <  r  <  1,  can  be  shown  to  have  an  asymptotic  distribution 
under  Hq  at  the  “exact”  values  0  =  tq  <  ri  <  ...  <  Tc  =  1  which  is  the  same  as 
the  distribution  of  a  Brownian  Bridge  stochastic  process  B(t),  0  <  t  <  1,  a  zero  mean 
Gaussian  process  with  covariance  kernel 

E[B{Ti)By2]  =  min{Ti,T-2)  -  TiT2. 

We  czdl  C~{t)  a  dynamic  statistic  since  the  significance  of  its  graph  can  be  determined 
by  thinking  of  it  as  a  sample  path  of  a  Brownian  Bridge.  We  also  relate  its  graph  to 
various  deterministic  shapes  it  could  have  under  various  assumptions  about  the  values  of 
the  means  pj. 

Graphical  data  analysis  of  C'(.)  can  often  indicate  whether  to  accept  or  reject  Hq. 
To  obtain  “p  values”  for  the  level  at  wliich  Hq  is  rejected  or  accepted  we  need  to  form 
functionals  of  the  process. 

Theorem:  The  important  functional 


can  be  related  to  the  traditional  F  test  statistic  FT  by 

FT  =  ((n  -  c)/(c  -  1))F,  F  =  R^/{1  -  R^). 


Proof:  Verify  that 


“  •^mean/‘^y  > 

+  •^var> 

RV  =  l-E^  =  S5ar 


5y  =5, 


mean 


The  distribution  of  R?  under  Hq  is  analogous  to  sample  correlation;  therefore  we  call 
R?  a  correlation  statistic  to  distinguish  it  from  an  F  statistic  of  the  form  F  =  R^/{1  —  R^)- 

F  tests  (and  R^  tests)  are  “portmanteau”  statistics  which  .should  be  represented  in 
terms  of  diagnostic  statistics  which  can  help  indicate  which  part  of  the  data  is  the  cause 
of  rejection  of  the  null  hypothesis.  For  this  puqiose  we  introduce  “two  sample  statistics” 
for  the  no-change  hypotheses 

Hj^:  The  pooled  sample  of  variables  has  same  distribution  as  the  pooled 

sample  of  variables  Vj+i, . . .  ,Yc, 

Hj~:  The  variable  Yj  has  the  same  distribution  as  the  pooled  sample  which  does  not 
include  Yj. 
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Denote  by  TTj<  the  two  sample  t-test  statistic  for  and  denote  by  TTj-  the  two 
sample  t-test  statistic  for  Hj-.  One  can  show 


TTj=  =  ((n  -  c)pj/(l  -  pj)y\Y-j,  -  Y-..)/Sy 
TTj^  =  (("  -  c)r, 7(1  -  Ti)y\Er[Y\x  <  j\  -  y-..)/Sy. 


Therefore 

rTj<  =  ((n  -  c)/(l  -  b2))'5cT-(t;), 
TTj^  =  ((n  -  c)/(l  -  Jt^))  5cT-(Tj) 
The  portmanteau  F  test  statistic  can  be  expressed 

c 

Fr  =  (c-ir‘^(l-Pj)|TT;=p. 

j=l 


5.  COMPONENTS  OF  CHANGE  PROCESSES 

We  call  the  two-sample  t  statistics  TT  “abrupt  chcinge”  statistics  since  they  test 
hypotheses  of  an  abrupt  change.  We  woiild  like  statistics  that  test  for  smooth  change 
(such  as  linear  or  quadratic).  Natural  test  statistics  are  linear  functionals  in  the  change 
density  process,  called  components  T{K)  or  [c”, /v]  with  score  function  K{t),  defined  by 


K{T)c{T)dT 


The  identity  score  function  K(t)  =  r  yields  the  Wilcoxon  type  statistic 


^  y*~j  Pj  +  Tj) 

j=l 

A  general  approximation  for  a  component  is 

J=1 

defining 

=  -SCrj.i  +Tj)  =  Tj  -  .5pj 

To  express  the  statistics  TTj—  and  TTj^  as  components  we  first  define  score  functions 

Kj=(T)  =  1/pj  for  Tj—i  <  T  <  Tj,=  0,  otherwise; 

Kj^^r)  =  1  for  0  <  r  <"  Tj,  =  0,  otherwise. 

It  can  be  shown  that  under  Hq  a  component  T{K)  is  asymptotically  normal  with  mean  0 
and  variance 

Norm(K')‘^  =  /  SA'(r)  —  /  K{s)ds\^dT 

Jo  Jo 
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The  identity  score  function  K{t)  =  r  has  norm  squared  1/12.  Therefore  the  asymptotically 
Normal  (0,1)  version  of  the  Wilcoxon  type  test  statistic  is  the  component  T(12'^r). 

The  score  functions  corresponding  to  the  TT  statistics  have  norms  square 

Norm(A'y<)^  =  r(l  —  r), 

Norm(Aj=)2  =  {l-pj)/pj 

Consequently  one  can  represent  the  two  sample  t  statistics  as  components 

TTj<  =  T(A',</Norm(X,<)) 

TT,=  =  T(Aj=/Norm(Aj=)) 

6.  FOUR  PHASES  OF  CHANGE  ANALYSIS 

A  sample  change  process  C”(r),  0  <  r  <  1,  is  a  dynamic  statistic  (sample  path  of  a 
stochastic  process)  which  often  can  be  shown  to  satisfy  under  the  null  hypothesis  of  “no 
change”  the  null  hypothesis  Hq  :  C~{.)  is  a  Brownian  Bridge  (or  a  related  hypothesis).  The 
statistical  analysis  of  C”(.)  has  four  phases; 

Phsise  1:  Graphical  analysis]  is  the  plot  of  C'~(t),  0  <  r  <  1,  oscillatory,  a  deterministic 
parabola,  ether  patterns. 

Phase  2:  Non-linear  functionals.  One  tests  Hq  by  computing  the  values  of  test  statis¬ 
tics  (whose  asymptotic  distributions  under  Hq  can  be  deduced  from  the  theory  of 
empirical  processes) 

/'  lC-(r)l2rfr, 

Jq 

/  (|C-(r)p/r(l  -  T))dr, 

Jo 

max  |C'(r)l, 

0<r<l 

max  |C~(r)l/r(l  -  r). 

T=jln 

Phase  3:  Linear  functionals.  For  various  score  functions  A'(r),  called  change  score 
functions,  one  computes  the  lineeir  functional  (or  component) 

C'(A)=  7C(r)dC'(r)  =  C  K{T)c{T)dT 

Jo  Jo 

One  can  often  write  approximately 


n 

C'{K)  =  (1/n)  K{{j  -  .5)/n)c{{j  -  .5)/n) 

J=I 

The  score  fimction  is  usually  chosen  as  a  sequence  of  Orthonormal  functions 
V’i(.),t/)2(.))  •  •  ■  1  especially  the  Legendre  polynomials,  winch  test  against  patterns  in  the 
change  density  c‘(r). 

The  key  to  charage  analysis  is  to  choose  transformations  of  data  (score  the  data)  which 
are  most  powerful  for  detecting  change.  From  the  sample  change  processes,  suitable  linear 
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functionals  fscore  the  change)  axe  formed.  These  linear  functionals  are  called  “double  score 
components’’.  One  can  define  biveiriate  density  functions  d(T, u),  0  <  r  <  1,0  <  u  <  1,  of 
whidi  double  score  functions  sire  diagnostics.  Choice  of  data  score  functions  axe  motivated 
in  sections  8  and  7  parametrically  and  non-parametrically,  respectively. 

Phase  4:  Density  estimation.  By  one  of  the  many  methods  available  in  the  theory  of 
curve  smoothing  (kernel  methods,  splines,  exponential  methods,  wavelets,  etc.)  form 

a  smooth  estimator  c(r)  of  the  change  density. 

An  exposition  of  the  theory  of  these  phases  would  require  a  book  and  is  beyond  the 
scope  of  this  paper.  Our  goal  in  this  paper  is  to  outline  the  phases  and  to  explain  how  we 
choose  transformations  of  the  original  data  from  which  to  form  a  change  process. 

7.  NONPARAMETRIC  STATISTICS  MULTI-SAMPLE  ANALYSIS 

To  test  the  equality  of  c  samples  non-parametric  statistics  starts  by  transforming 
each  observation  to  its  “mid-rank”  in  the  pooled  sample.  Let  Fy'  and  py~  denote 
the  szimple  distribution  and  probability  mass  functions  in  the  pooled  sample.  Define  the 
mid-distribution  function 

Fy'”“‘^(i/)  =  Fy~(y)  -  .5py'(y). 

Let  J(u),0  <  «  <  1,  be  a  score  function.  Transform  to 

Zji  =  J(Fy-"‘'<(yji)). 

Our  definition  of  trzmsformed  data  Z  handles  tied  data  and  discrete  data  without 
extra  effort.  Traditional  definitions  assume  all  values  in  the  pooled  sample  are  distinct, 
and  transform  Vji  to 

Zji  =  J(nFY-(Yji)/(n  -f- 1))  =  J{Rji/{n  ^  1)), 

where  Rji  is  the  rank  in  the  pooled  S2imple  of  ly,-. 

We  calculate  for  the  trzmsformed  data  Z  the  correlation  type  statistic  from  Z 

values  in  exactly  the  same  way  that  was  calculated  from  Y  values.  Asymptotically  for 
J(«)  =  u,  Sz  =  1/12,  SO  that  we  could  define 

c 

4  =  -  Z-  .  f 

j=l 

Note  Z~„  =  .5.  The  Kruskal-Weillis  statistic  equals  (n  -f-  l)i?^, 

4  =  12((^(n,/n)Z-J  -  .25), 
j=l 

where  Z~j  is  the  rank  average  in  the  j-th  group,  tratlitionally  computed 

Z-j  =  (1/ny)  ^  Rji/{n  +  1) 

1=1 
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Traditional  non  parametric  computes  only  iimnerical  test  statistics  such  as  correspond¬ 
ing  to  a  score  function  J(u)  =  u.  The  change  analysis  approach  to  non-parametric  analysis 
with  score  function  J(u)  =  u  starts  with  a  change  density  defined  by 

Cz'(t)  =  -  .5),Tj_l  <T  <  Tj, 

does  graphical  data  analysis  of  its  change  process  and  computes  double  score 

components  \K,  J]. 

8.  FISHER-SCORE  CHANGE  PROCESSES 

To  detect  change  over  time  in  a  sequence  one  must  have  some  prior  opinion  about  the 
ways  in  which  the  probability  distribution  of  the  observations  may  be  changing  (such  as  in 
location,  scale,  skewness,  etc).  Sample  change  processes  are  formed  for  transformed  data, 
where  the  transformation  is  called  intuitively  a  data  score  function.  The  most  powerful  data 
transformations  are  essentially  the  sufficient  statistics,  or  more  precisely  the  Fisher  score 
functions,  when  one  heis  a  parametric  model  /(y;  6)  for  a  random  sample  Y (i),  i  =  1, . . . ,  n, 
where  0  =  (^1, . . . , 

The  maximum  likelihood  estimator  6"  is  obtained  by  maximizing  the  average  log- 
likelihood  ^ 

L(«)  =  (l/n)^log/(y(i);9) 
t  =  l 

Define  score  functions 

Sj{Y-,6)  =  d/d6j\ogf{Y-,6) 

The  maximum  likelihood  estimator  is  the  solution  of  the  estimating  equations  for  j  =  1, 

...,k 

n 

(l/n)52s,(y(();r)  =  0. 

<=1 

Our  approeich  to  change  analysis  asks  if  for  every  potential  changepoint  t  =  m/n  the 
parametric  model  with  6  =  6''  fits  the  data  Y(t),  t  =  1, . . . ,  m,  up  to  the  time  m  in  the 
sense  that  approximately 

m 

Z=1 

We  define  the  Fisher-score  change  process  to  linearly  inteqjolate  its  values  at  r  =  m/n, 
for  m  =  1, . . .  ,n 

m 

C-(r-,Sj)  =  (l/n)Y,S]{Y{iy,e-) 
t=l 

where 

5;(y;r).  =  Sy(y;«-)/E#|Sj(y;e-)j. 

We  form  k  Fisher-score  change  processes,  for  j  =  1, . . . ,  fc. 

We  call  this  approach  “random  walk  (or  CUSUM)  your  normalized  scores.”  We  are 
developing  the  probability  theory  of  the  Fisher-score  change  processes. 

These  theoretical  concepts  can  best  be  understood  through  examples.  Consider  a 
gamma  distribution  model 

fiy,u,6)  =  i6‘'r{u)r^x’'-^exi>{-y/6) 
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where  ^  is  a  positive  scale  parameter,  assumed  unknown,  and  u  is  a.  positive  shape  param¬ 
eter,  assmed  known.  One  can  show  that  the  score  function  of  the  parameter  6  is 

s(Y;e)  =  (i/9){{Yie)-uy, 

the  maximum  likelihood  estimator  is 


d'  =  Y'/u; 

the  normeilized  score  function  evaluated  at  the  maximum  likelihood  estimator  of  the  pa¬ 
rameter  may  be  shown  to  be 

s'(Y(ty,e)  =  vH(Y(t)/Y-)-i). 

To  test  the  observations  y’(.)  for  change,  one  forms  the  maximum  likelihood  score 
change  process  0  <  r  <  1,  and  tests  if  tliis  dynamic  statistic  is  significantly 

different  from  a  sample  path  of  a  Brownian  Bridge  stochastic  process.  A  linear  functioned 
of  the  change  process  corresponding  to  the  score  function 

K{t)  =  12  \t  -  .5) 


is  ^ 

(r(A-,S*)  =  (l/n)f;(12..)-5((K(<)/n  -  l)((i  -  .5)/n) 

<=1 

=  (12t-)-’(l/n)  nm*  -  i)ln)/Y- 
<=1 

Under  the  null  hypothesis  of  no  change  the  asymptotic  distribution  of  n‘^C~{K,S*)  is 
Normal(0,l). 

An  exeimple  of  an  application  of  this  statistic  is  in  Hsu  (1979)  where  it  is  presented 
as  a  test  designed  for  a  smeill  cliange  in  the  scale  parameter  9  of  eui  independent  Gamma 
distributed  sequence,  derived  by  Kander  and  Zacks  (1966)  by  a  Bayesian  analysis  assuming 
the  changepoint  r  is  uniformly  distributed  in  time.  This  test  statistic  is  derived  in  ovir 
approach  as  analogous  to  a  component  in  standard  goodness  of  fit  analysis. 
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